[ET-VK] Add cooperative matrix dispatch for quantized linear#19892
Draft
xuyanwen2012 wants to merge 1 commit into
Draft
[ET-VK] Add cooperative matrix dispatch for quantized linear#19892xuyanwen2012 wants to merge 1 commit into
xuyanwen2012 wants to merge 1 commit into
Conversation
Adds coopmat shaders and dispatch for 4-bit (q4gsw, dq8ca_q4gsw) and 8-bit (dq8ca_q8csw) quantized linear, gated on Adapter::supports_cooperative_matrix(), wave64 subgroup size, buffer output storage, and coopmat tile alignment — mirroring the fp16 coopmat path from pytorch#19009. Ineligible shapes fall back to the existing tiled shaders. Review order: QuantizedLinear.cpp for the dispatch gate (can_use_q4gsw_coopmat), then the linear_*_coopmat.glsl shaders, op_registry.py / custom_ops_lib.py / patterns for registration, then the tests.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19892
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This PR needs a
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds KHR cooperative-matrix dispatch for quantized linear on the Vulkan backend, extending the fp16 coopmat path from #19009 to quantized weights:
linear_q4gsw_coopmat(fp16 act × INT4 weight) andlinear_dq8ca_q4gsw_coopmat(8-bit dynamic act × INT4 weight)linear_dq8ca_q8csw_coopmat(8-bit dynamic act × INT8 weight), plus its tiled V_DOT4 fallback and op registrationCoopmat is gated on
Adapter::supports_cooperative_matrix(), a wave64 subgroup, buffer output storage, half dtype, and M/N/K tile alignment. Ineligible shapes — including any with a bias — fall back to the existing tiled shaders.Review order
QuantizedLinear.cpp(dispatch gatecan_use_q4gsw_coopmat) → thelinear_*_coopmat.glslshaders →op_registry.py/custom_ops_lib.py/patterns/quantized_linear.py(registration) → the custom-op tests.Test plan
Built against
mainwithEXECUTORCH_BUILD_VULKAN=ON; ran the custom_ops prototyping tests on an AMD Radeon 780M (RDNA3, wave64):test_q4gsw_linear: 72/72 correctness passtest_dq8ca_q8csw_linear: 22/22 correctness passPer the existing convention, fp16 (coopmat-only) correctness is not asserted against the fp32 CPU reference (the fp16 round-trip diverges at near-zero / overflowing elements); the coopmat path is exercised via build + dispatch + perf.
Open questions (draft)